Via 313 word cloud
Yelp.com has been a directory and review go-to for many years. A four-star app in both Google Play and the Apple Store, the company claims to be home to over 135 million reviews.
From the perspective of the consumer, positive reviews can lead to increased interest and patronage. The human phenomenon known as FOMO (the Fear Of Missing Out) keeps us alert of what others in our network enjoy taking part in and give us a kind of social incentive for keeping with the group. In contrast, poor reviews often signal unpleasant experiences from users that we may want to account for when choosing where to spend our money.
From the perspective of the business owner, online reviews have sort of become the digitized “word-of-mouth”. Naturally, the internet has made first-hand experiences with any product or company abundantly available on the web. With recent studies showing 95% of shoppers read online reviews before making a purchase, it’s no wonder services like Yelp are becoming increasingly important in customer experience analysis. Any successful business must inspire loyalty in its customers; to do that we must find a way to tap into the emotions our services evoke in our customers and the language they use when describing our business.
As a business owner, how can I get a snapshot of what my customers are saying on Yelp.com?
In this analysis, I try to provide a data-mining alternative to the manual, labor-intensive process of reading every review. Of course, there is a time for engaging with your customers and addressing their experiences online and in person. However, a primary goal of business intelligence is to bring together insightful and actionable information for key decision makers. To that end, this project aims to give an overall sense of reviews from Yelp.com for a pizza chain in Austin, TX.
As a broke college student, I must admit that pizza is a go-to for my colleagues and I. Yelp can help navigate the price points and experiences of competitive pizzerias in the area. For this customer experience analysis, I will borrow the URL of Yelp.com reviews for Via 313 Pizza in Austin, TX. I will then attempt to clean the data from multiple pages of reviews to construct a Document-Term-Matrix that will be used to provide us with text-based insight.
# Libraries
library(XML)
library(RCurl)
library(ggplot2)
library(syuzhet)
library(tm)
library(SnowballC)
library(wordcloud2)
# Yelp pizzeria URLs
viaURL <- 'https://www.yelp.com/biz/via-313-pizza-north-campus-austin' # 487 Reviews
Since I will be using the pizzeria comments from Yelp.com, I can use RCurl and XML to scrape the URLs for reviews and then clean their syntax for a more readable format that can be easily measured. The reviews are returned in lower case words with no numbers or punctuation. Here is an example of the first few cleaned reviews.
# Set URL
url <- viaURL
# Create empty data frame for reviews
reviewsDF <- data.frame()
# Get data and convert to data frame
reviewsPage <- getURL(url)
parsedReviews <- htmlParse(reviewsPage)
reviews <- xpathSApply(parsedReviews, '//div[@itemtype="http://schema.org/Review"]', xmlValue)
reviewsDF <- data.frame(reviews)
# Set number of additional pages for more reviews (20 per page)
for (i in 1:2) {
# Set URL
url <- paste('https://www.yelp.com/biz/via-313-pizza-north-campus-austin?start=', i*20, sep="")
reviewsPage <- getURL(url)
parsedReviews<-htmlParse(reviewsPage)
reviews <- xpathSApply(parsedReviews, '//div[@itemtype="http://schema.org/Review"]', xmlValue)
reviews <- data.frame(reviews)
reviewsDF <- rbind(reviewsDF, reviews)
}
# catch.error() function to check and convert to lower case
catch.error <- function(x)
{
y <- NA
catch_error <- tryCatch(tolower(x), error=function(e) e)
if (!inherits(catch_error, "error"))
y <- tolower(x)
return(y)
}
# cleanReviews() function removes superfluous characters
cleanReviews <- function(review){
review = gsub("\n", " ", review)
review = gsub("[[:punct:]]", " ", review)
review = gsub("[[:digit:]]", " ", review)
review = iconv(review, "UTF-8", "ascii",sub='')
review = catch.error(review)
review
}
# cleanReviewsAndRemoveNAs() function removes NA or duplicates
cleanReviewsAndRemoveNAs <- function (allReviews) {
allReviewsCleaned <- sapply(allReviews, cleanReviews)
allReviewsCleaned <- allReviewsCleaned[!is.na(allReviewsCleaned)]
names(allReviewsCleaned) = NULL
allReviewsCleaned <- unique(allReviewsCleaned)
allReviewsCleaned
}
# Clean the reviews
reviewsCleaned <- cleanReviewsAndRemoveNAs(reviewsDF)
head(reviewsCleaned)
[1] " this was first time experiencing detroit style pizza i ve had chicago style new york style and papa murphy s let me tell you this pizza blows them all out of the park the crispy buttery sides make this pizza complete the ambassador bridge is one of the best pizzas i ve ever had and this place is so cool because they allow you to do a half and half pizza my other half was the pineapple does belong on pizza with jalapenos other than the pizza being incredible the customer service was out of this world every time my family is in town i ve taken them their and were treated with a taste of a dessert or a sample of an appetizer they even gave me a birthday card on my birthday who does that easily one of my favorite places in austin "
[2] " fair warning i am a pineapple jalapeno kinda gal i have been to detroit quite a few times now and that is probably the only reason i did not think this place it was as great as many others but it was pretty great for detroit style pizza not in detroit i will say the reason i thought it was not up to par for me was because it was a little too saucy and a little too cheesy don t judge the only reason that was a problem for me was because i love detroit style pizza for the crust and too much cheese and sauce will take away from the crunch of the pizza i also was not a fan of how the pineapples were rounds i understand the aesthetic was better but i did not get a piece of pineapple in each bite which was a real bummer the whole pizza was kinda falling apart but the flavors of the crust and sauce were definitely up to par "
[3] " detroit style pizza in austin now we re talking via s rise in the city by providing delicious square pizzas has catapulted them to new heights they ve managed to open several locations in just the last few years but the north campus location is the original and by far the most popular i have to admit i am a sucker for square slices especially ones like detroit style having had the real thing myself i can vouch and say this is the real deal the sauce is a little on the sweet side but i love the crunch of the crust and how amazing the taste is especially out of the oven the toppings are a little on the fancier side than one would find in a classic detroit pizza parlor but via expands on the variety and really provides combinations that pack a punch for all palates there is a soft spot in my stomach for the carnivore which could probably be more abundant with the toppings but it is truly the ultimate pizza to order here right next to it is the ambassador bridge which is basically the carnivore with added chopped garlic coming in a very close second is the cadillac which has gorgonzola fig preserves prosciutto di parma parmesan and balsamic glaze a classic is the detroiter which includes two kinds of pepperoni one smoked under the cheese and naturally very curly pepperoni it s so good an square aka x runs anywhere from where as the square aka x is meant to be a personal size runs from no one ever said pizza was cheap especially the good stuff service is usually some college aged person taking orders behind the counter and most of the staff is pretty hip while it s understandably not open as late as the east side location i still give this north campus a big thumbs up is there such a thing as too many times "
[4] " loved it loved it loved it if you are looking for fabulous tasting gf pizza this is it their crust takes like the regular buttery thick yummy crust that i no longer eat the other reviews on yelp are true this place is so good we look forward to more visits this place was a hit for the whole family there was something for everyone we went to the location close to ut campus on a sunday for dinner when the students were out so there was no wait for dinner time however i wouldn t expect that to be the case when school is back in session "
[5] " ugh i love you via best pizza i ve ever had hands down my favorite is the cadillac which has gorgonzola and fig and sometimes i add ricotta to it which makes it super extra yummy also their ranch idk what the hell they do to it but man it s the best ranch i ve ever had i could drink this ranch definitely get an order of cheese sticks with ran "
[6] " detroit styled pizza in austin not sure what that even means but sign me up i m usually not a fan of thick pizza because it leaves me feeling extremely sleepy and heavy but via did a great job we ordered the hawaiian and the carnivore both delicious the italian salad was basic and fresh and we were offered free cheese bread by our server overall a great meal "
Now we have clean text data to work with!
To chart customer experiences by grouping emotional terms, I can use the syuzhet package to group terms according to an associated NRC sentiment dictionary. After summing the frequencies of each emotional category, we can generate a nice bar chart of the customer emotions in Via 313 Pizza reviews.
# Obtain emotion frequency per review
reviewsEmotions <- get_nrc_sentiment(as.character(reviewsCleaned))
reviewsEmotionsDF <- t(data.frame(reviewsEmotions))
# Calculate number of reviews with each emotion > 0
reviewsEmotionsDFCount <- data.frame(rownames(reviewsEmotionsDF),
rowSums(reviewsEmotionsDF > 0))
rownames(reviewsEmotionsDFCount) <- NULL
colnames(reviewsEmotionsDFCount) <- c('Emotion','Frequency')
# Barplot of review sentiment
par(mar = c(3,6,1,1))
barplot(reviewsEmotionsDFCount$Frequency,
names.arg = reviewsEmotionsDFCount$Emotion,
col = "lightblue",
horiz = TRUE,
main ="Customer Emotions - Via 313 Pizza",
cex.names = 1,
las = 2)
As we can see from the chart, leading emotions from customer reviews are positive. They signal trust, joy and anticipation when describing their experiences at Via 313 Pizza.
Next, we can use the tm package to take a closer look at the most frequent words used in the reviews and generate a bar chart of those terms.
# Create text corpus of reviews
reviews_corp <- Corpus(VectorSource(reviewsCleaned))
# Remove stopwords and create document-term matrix
reviews_DTM <- DocumentTermMatrix(reviews_corp,
control = list(stopwords=T))
# Find frequent terms (cutoff frequency is 50)
findFreqTerms(reviews_DTM, lowfreq = 50)
# Sum up frequencies review and then sort in descending order
reviews_FreqTerms <- sort(rowSums(t(as.matrix(reviews_DTM))), decreasing=TRUE)
# Create data frame with two columns: word and frequency
reviews_FreqTermsDF <- data.frame(word = names(reviews_FreqTerms),
freq = reviews_FreqTerms)
row.names(reviews_FreqTermsDF) <- NULL
# Plot the ten most frequently used words in reviews
barplot(reviews_FreqTermsDF[1:10,]$freq,
las = 2,
names.arg = reviews_FreqTermsDF[1:10,]$word,
col ="lightblue",
main ="Most Frequent Words - Via 313 Pizza")
From the chart we can see that 313 Pizza is praised for its detroit-style pizza. People seem to love their crust, cheese, and service. Moreover, it is often praised as a place where customers had a great time or experience.
Since I already have a word frequency data frame, I can use wordcloud2 to generate a nice visual component to the term frequencies.
# Create word cloud
wordcloud2(data = reviews_FreqTermsDF, size = 1.5)
Via 313 word cloud
Via 313 Pizza seems to provide customers with a satisfying experience with both good food and a pleasant environment. Their signature detroit-style pizza and outstanding service has helped them earn 4.5 out of 5 stars.
To make this kind of analysis more impactful for the business, I would recommend building snapshots of customer ratings a monthly or even weekly business practice, depending on the availability of updated reviews. Additionally, I would recommend incorporating many more sources of reviews from social media outlets such as Twitter, Facebook, and Google. These modifications can help any business have a good idea of what their customers are talking about and may leverage this information to make a positive change in how they do business with customers.